
Europaisches Patentamt — 
European Patent Office 
Office europeen des brevets 




(fj) Publication number : 0 582 490 A2 



EUROPEAN PATENT APPLICATION 



@ Application number : 93306265.5 
@ Date of filing : 09.08.93 



@ int. ci. 5 : G06K 9/00, G06K 9/48 



(So) Priority : 07.08.92 US 926198 

(53) Date of publication of application ; 
09.02.94 Bulletin 94/06 



(84) Designated Contracting States : 
DE FR GB 



© Applicant: R.R. DONNELLEY & SONS 
COMPANY 

77 West Wacker Drive 

Chicago, Illinois 60601-1696 (US) 



<72) Inventor : Bengtson, Michael 
22913 East Drive 
Richton Park 60471, Illinois (US) 

(74) Representative : Marshall, John Grahame 
SERJEANTS 25 The Crescent King Street 
Leicester LE1 6RX (GB) 



(£) Converting bitmap data into page definition language commands. 

(57) A method of and apparatus for converting an 
original representation of a page element exp- 
ressed in bitmap form into a page definition 
language representation of the page element 
develops an element approximation expressed 
in the page definition language, converts the 
element approximation into an approximation 
bitmap and compares the approximation bit- 
map to the original representation expressed in 
bitmap form to obtain an error indication. The 
error indication is checked to determine 
whether it meets a certain criterion and, if so, 
the element approximation is used as the page 
definition language representation. Otherwise, 
one or more further element approximations 
are developed until an element approximation 
is obtained that results in an error indication 
which meets the certain criterion. 
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Technical Field 



The present invention relates generally to meth- 
ods and systems for converting data, and more par- 
ticularly to a method of and system for converting im- 
age data in bitmap form into page definition language 
commands. *"* 

Background Art 

Often, it is desired to reproduce a book that has 
been taken out of print and for which printing plates 
are no longer available. One way to effectuate this re- 
sult is to photograph the printed pages and use the re- 
sulting film images to create new printing plates. This 
method has the disadvantage of introducing noise 
into the reproduction process that in turn degrades 
the quality of the reproduced pages. Other reproduc- 
tion methods, such as photocopying, result in pages 
of even poorer quality, and hence are not acceptable 
under most circumstances. 

A still further reproduction method relies upon 
the use of optical character recognition (OCR) tech- 
niques wherein the pages to be reproduced are elec- 
tronically scanned to develop an electronic file repre- 
senting the characters on the page. Modern OCR 
techniques, however, cannot process nontext im- 
ages, are limited in their recognition capability and re- 
quire knowledge of the font in which the page char- 
acters are printed in orderfor sufficient accuracy to be 
obtained. This OCR reproduction method is thus re- 
stricted to those books or other printed material util- 
izing fonts that can be recognized. Such a restriction 
severely limits the types of source materials that can 
be reproduced. In addition, such a reproduction meth- 
od does not retain information concerning the format 
or style of each page. 

In recent years, page description languages 
(PDL's) like PostScript developed by a Adobe Sys- 
tems, Inc., of Mountain View, California, have been 
developed in an attempt to provide a standardized 
way of describing a printed page. 

Methods and systems have been known for con- 
verting data expressed in a PDL into bitmap form 
Typically, the PDL expresses page elements, such as 
images, line art or characters, as a series of short- 
hand expressions indicating the location of the page 
element and its appearance. The bitmap representa- 
tion, on the other hand, comprises a series of digital 
values defining the page on a pixel-by-pixel basis. 
Such converters, otherwise known as raster image 
processors (RIP's), are used to drive printers or other 
output devices that do not include an interpreter for 
the page definition language. 

Summary of the Invention 

In accordance with the present invention, a meth- 



od of and system for converting data in bitmap format 
into page definition language facilitates reproduction 
of pnnted pages in a simple and accurate manner 
More particularly, a method of converting an orig- 
5 inal representation of an image expressed in bitmap 
form into a page definition language representation of 
the .mage includes the steps of establishing a first set 
of recognition parameters, using the established set 
of recognition parameters to convert the original rep- 
10 resentation into an image approximation expressed in 
the page definition language and converting the im- 
age approximation into an approximation bitmap The 
approximation bitmap is compared to the original rep- 
resentation expressed in bitmap form to obtain an er- 
15 ror indication. A determination is made whether the 
error indication meets a certain criterion, and, if so 
the image approximation is used as the page define 
hon language representation. Otherwise, one or more 
further image approximations expressed in the page 
20 definition language are derived, converted into image 
approximations and compared to the original repre- 
sentation to obtain one or more further error indica- 
tions. The further error indications are checked to de- 
termine whether each meets the certain criterion and 
25 if so one of the further image approximations is used 
as the page definition language representation 

In accordance with another aspect of the present 
invention, a method of converting a bitmap represen- 
tation of a character expressed in a font into a page 
30 definition language expression of the character in- 
cludes the steps of detecting a characteristic of the 
character and using the detected characteristic to ob- 
tain successive estimates of the identity of the char- 
acter and the font. A determination is made as to 
35 whether the successive estimates are the same and 
a page definition language expression of the charac- 
ter and the font is developed using one of the esti- 
mates if the successive estimates are the same If the 
An V 1 <* essive ^ates are not the same, one or more 
40 further successive estimates are obtained and com- 
pared until two are the same, whereupon a page def- 
inition language expression of the character and font 
is developed using one of the estimates. 

In accordance with yet another aspect of the 
45 present invention, a method of reproducing a plurality 
of characters each printed in a font at a position on a 
page includes the steps of converting the printed 
characters into a bitmap representation of same, se- 
ecting a f irst character and detecting characteristics 
50 thereof. The detected characteristics are utilized to 
develop character and font data representing the 
identity of the character and the font in which the 
character is expressed. The character and font data 
are stored together with position data representing 
55 the position of the character on the page. Character- 
istics of the remaining characters on the page are de- 
tected and the character and font data representing 
the identity of the characters and the fonts in which 
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the characters are expressed are stored together 
with further position data representing the positions 
of the characters on the page. The stored character 
and font data and the position data are converted into 
page definition language expressions and the page 
definition language expressions are utilized to oper- 
ate a printing device such that it produces a printed 
page. 

In accordance with a still further aspect of the 
present invention, a system capable of commanding 
a printing device to reproduce a page having a plural- 
ity of characters printed thereon wherein each char- 
acter has an identity and is printed in a font and 
wherein the printed page is represented by a bitmap 
representation includes means for detecting metrics 
of each character of the bitmap representation. 
Means are responsive to the detecting means for ob- 
taining an estimate of each character including the 
identity thereof and the font in which such character 
is printed. Means are responsive to the obtaining 
means for comparing the estimates of the characters 
with the bitmap representation to obtain an error in- 
dication. Means are responsive to the comparing 
means for successively correcting character esti- 
mates until the error indication meets a certain criter- 
ion and means are provided for assembling printing 
device commands in a page definition language using 
the character estimates. 

The present invention permits a book or other 
printed matter to be reproduced in a manner which not 
only conveys the informational content therein, but 
also the appearance of the printed page in a substan- 
tially exact manner. 

Brief Description of the Drawings 

Figure 1 comprises a simplified block diagram of 
the system according to the present invention; 
Figures 2A and 2B, when joined along the simi- 
larly lettered lines, together comprise a general- 
ized flowchart of programming executed by the 
computer of Figure 1 to convert a bitmap repre- 
sentation of a printed page into a page definition 
language (PDL) file; 

Figure 3 comprises a more specif ic flowchart of 
the programming executed by the block 38 of Fig- 
ure 2A; 

Figures 4A and 4B, when joined along the simi- 
larly lettered lines, together comprise a more 
specific flowchart of the programming executed 
by the block 64 of Figure 3; 
Figure 5 comprises a more specific flowchart of 
programming executed by the blocks 78 and 80 
of Figure 4A; 

Figure 6 comprises a more specific flowchart of 
the programming executed by the block 72 of Fig- 
ure 4A; 

Figure 7 comprises a more specific flowchart of 



programming executed by the blocks 62 and 66 
of Figure 3; 

Figure 8 comprises a more specific flowchart of 
programming executed by the block 36 of Figure 
5 2A and by the block 58 of Figure 2B; 

Figure 9 comprises a more specific flowchart of 
programming executed by the block 48 of Figure 
2B; and 

Figure 10 comprises a representation of an error 
10 bitmap illustrating calculation of error statistics by 

the blocks 160, 168 and 170 of Figure 9. 

Description of the Preferred Embodiment 

15 Referring now to Figure 1 , a system 10 converts 

a printed page 12 into a series of page definition lan- 
guage (PDL) expressions or commands suitable for 
one or more output devices 14. The system 10 in- 
cludes a computer 16 that may comprise, for exam- 

20 pie, a commercially available persona! computer hav- 
ing a keyboard 1 8 and a video display terminal (VDT) 
20. The computer 16 receives a bitmap representa- 
tion of the page 12 from a scanner 22, which scans 
the page 12 on a pixel-by- pixel basis and develops 

25 digital values representing the density of each pixel 
of the page. If desired, the scanner 22 may be re- 
placed by any device capable of digitizing a printed 
page. 

In the preferred embodiment, the computer 16 
30 converts the bitmap representation developed by the 
scanner 22 into the PostScript page definition lan- 
guage developed by Adobe Systems, Inc. of Mountain 
View, California. The computer 16 may alternatively 
develop commands or expressions in a different page 
35 description language, if desired. The PDL commands 
are used to operate a printer or one or more other out- 
put devices to reproduce the printed page 12. The 
commands or expressions may alternatively be deliv- 
ered to a storage unit 24 for later processing, if de- 
40 sired. 

Figures 2A and 2B generally illustrate the pro- 
gramming executed by the computer 16 to effectuate 
the foregoing result. Processing begins at a block 30 
that sets an iteration counter N equal to one. A block 

45 32 then obtains the bitmap representation of the page 
12 from the scanner 22. A block 34 checks to deter- 
mine whether the iteration counter is equal to one 
and, if so, a block 36 permits an operator to establish 
initial recognition parameters that will later be used to 

so estimate the identity of page characters and the font 
in which the characters are expressed. 

Following the block 36, a block 38-converts the 
original bitmap representation of the page 12 into a 
PDL file. Referring again to Figure 1, the block 38 

55 separates the page 12 into a non-text portion 40, 
which may include, for example, page elements such 
as graphic images and line art, and a text portion 42 
containing page elements in the form of characters 
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h caressed in a font The portions 40 and 42 are 
converted ^o PDL expressions separately and are 

iater merged to "^^^Uwing the block 
*tT«e?£S >-ge 9 processor 
38, a block 44 exe ^I^ theassemb , edP DLfileinto 
(RIP) program to invert £«J» 46 Figure 2B , 

"I aP ^C7^on^ap f?e devel- 
then ^T block 44 to the original bitmap represen- 
ted by the block 4ato s referred embodiment, 

tation of the ^ «^^2S^tl-«H"» 
this companson a bittnap flle to ob- 

imation bitmap file trom i » jn 
tain an error * J^^^^ by 

^Snda^rorfonte having characteristics 
I no? Dreciely match stored charactenst.es. 
tnat do not P^ e J tnus develops an error ,n- 

srr:s^r one or more error 

detection criteria by a block 48^ ^ ^ 

Following the bloc* 4 duced page 

areacceptaDie. In,& s h a de termina- 

m0fe ^aofe^^^^ 
tion may be made even HreasonaWe 

orfontshavenotbeenrec^n ^ ^ ^ 

recognition opt.onsl^eve^O chchar _ 

Ca8eth %°nS^rhtn;TDlexpre,s^ 

commands in the stooge -J 2 ^ blebytne 

Kthee K?^4Sc todeterminewhetherthe 
bl ock50.ab.o* 54checksto ^ ^ 

ESTl- S a^*" ,imit has been 

MAX1. If so, tnenj . d otherwise, the 



wodc 34 which again checks to determine whether 
ondtd ^ubseauent passes thmugh this ^n oj 
the program, N is greater than ^ a " d . h ^ y ^ 

fnto Xther approximation bitmap file and is com- 

10 ^stth^ 

9 2 the block 50 as to whetherthe errors are 

Se^on counter is incremented and the reoogni- 

aX used to operate the output device or devices 
20 14 " Flgureafflustratestheprogranmingex^by 

ovnr^ssion for each image block. A block 64 tnenun 
""IS*. 4Aa„<MB MM. tta ■"»»-"*« 

• ui« r in like fashion, a block ou ewmww» 

oSI!. itt >h. ffX 
55 hZ Mhe ^program) or the block 58 (in the course 
of subsequent passes through the program). 

A block 82 then checks to determine whether 
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successive character identity and font identity esti- 
mates are the same. This is undertaken by checking 
to determine whether the variables C NEW and F NEW 
are equal to the variables Cqld and Fold, respectively. 
During the first pass through the programming shown 
in Figure 4A, the variable C NEW will not be equal to the 
variable Cold and the variable F NEW will not be equal 
to the variable Fold- Thus, control passes to a block 
84, which increments the loop counter M, and a de- 
termination is made by a block 86 whether the loop 
counter has reached a maximum limit MAX2. If so, fur- 
ther processing is terminated and control passes to 
the block 66 of Figure 3, which uses the values C NEW 
and F NEW as estimates of the character and font iden- 
tities, respectively. On the other hand, if the block 86 
determines that the loop counter has not reached the 
maximum, a block 88 assigns the values Cmew and 
Fnew to the variables Cold and Fold, respectively, and 
control returns to the block 78 where new estimates 
of the character identity and the font identity are 
made. 

Control remains with the blocks 78-88 until two 
successive identical character and font estimates 
have been obtained. In this event, control passes to 
a block 90, Figure 4B, which stores the current values 
of C NEW and F NEW and data representing the position 
of the character on the page. A block 92 then checks 
to determine whether all characters on the page have 
been processed. If so, control passes to the block 66 
in Figure 3. Otherwise, a block 94 selects the next 
character on the page and control returns to the block 
76, Figure 4A, where estimates of the next character 
are made. 

As noted previously, once all the characters have 
been processed, the PDL expressions for the charac- 
ters are developed by the block 66 of Figure 3. 

Figure 5 illustrates the programming executed by 
the blocks 78 and 80 of Figure 4A in greater detail. A 
block 100 detects one or more characteristics (or 
"metrics") of the character currently under consider- 
ation. The metrics are used to identify a character and 
font, and include, but are not limited to, stroke width, 
reversals (i.e., the number of times a character outline 
changes direction), number of strokes per character 
(i.e., the number of separate nonintersecting outlines 
per character (e.g., the letter i has two strokes where- 
as the letter t has one stroke)), the outline accelera- 
tion (i.e., how fast the outline changes direction), the 
number of unconnected outline paths (e.g., the char- 
acter o has two such paths, the character i has two 
such paths and the character t has one such path), 
the length of each outline, the ratio of the number of 
white pixels to black pixels in a character block, the 
angle of the character and the like. The angle of the 
character is determined by selecting equally spaced 
points on the outline of the character, calculating the 
slopes of lines tangent to the points and using a re- 
gression analysis to determine a line minimizing the 



least square error between the calculated slopes and 
the determined line. 

Following the block 100, a block 102 stores the 
detected character metrics in a memory of the com- 
5 puter 16. A block 104 then compares the stored met- 
rics against a library of previously created metrics for 
all fonts and characters which are to be searched. 
These metrics are created using the same process 
described previously and placed in the library. 

10 A pair of blocks 106, 108 then select the closest 

character and closest font based upon the compari- 
son conducted by the block 104. As previously noted, 
these selections are estimates in the sense that there 
may not be an exact match between the character 

15 currently under consideration and the stored charac- 
ter metrics. This variation can come about due to va- 
riations in print quality, smudges, erasures or other 
marks on the printed page or due to the fact that the 
font in which the character is printed simply does not 

20 have metrics matching any of the stored metrics. 

Following the block 108, control passes to the 
block 82 of Figure 4A. 

Figure 6 illustrates the programming executed by 
the block 72 of Figure 4A in greater detail. Following 

25 the block 62 of Figure 3, a block 120 shifts the page 
orientation to a normalized position, if necessary. A 
block 122 then detects various page characteristics 
or metrics and stores same in the memory of the com- 
puter 16. These characteristics may include the page 

30 size, margin sizes and the number of the page in the 
scan sequence. A block 124 thereafter selects a first 
pixel of the page and a block 126 determines the 
boundaries of a box surrounding the pixel. This, in 
turn, defines a block which is removed from the bit- 

35 map by a block 128. 

Following the block 128, a block 130 checks to 
determine whether there are further pixels to be proc- 
essed. If so, a block 132 locates the next pixel on the 
page and control returns to the blocks 126-130. The 

40 foregoing process repeats until all pixels have been 
processed. Once all pixels have been processed, a 
block 134 sends each block to the font and character 
recognition portion of the programming illustrated in 
Figure 5. 

45 Figure 7 illustrates the programming executed by 

the blocks 62 and 66 of Figure 3 in greater detail. A 
block 140 permits selection of a particular page de- 
scription language by the operator. In the preferred 
embodiment, as noted above, the page description 

so language comprises PostScript, although a different 
language may alternatively be selected. A block 142 
then generates the appropriate PDL commands and 
control continues to the blocks 64 or 66 of Figure 3. 
Figure 8 illustrates the programming executed by 

55 the block 36 of Figure 2A and the block 58 of Figure 
2B in greater detail. Ablock 150 selects an area of the 
scanned page to process. This area would preferably 
exclude images which are unrecognizable by the rec- 
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ognition process. Following the block 150, a pair of 
blocks 152, 154 allow automatic or operator-instituted 
specification of page and process parameters, re- 
spectively. Page parameters include, for example, 
the page size, page orientation (i.e., either portrait or 5 
landscape), line spacing, an approximation of charac- 
ter point size and margins. Process parameters in- 
clude error thresholds, an indication of the number of 
passes through the programming before each page is 
considered "completed", the amount by which recog- 10 
nition parameters may change before the recognition 
process ends, and the like. 

Following the block 154, control passes to the ap- 
propriate block 38 or 34 of Figure 2A. 

Figure 9 illustrates the programming executed by 15 
the block 48 of Figure 2B. The programming illustrat- 
ed in Figure 9 detects errors using a kernel calculated 
for each pixel. More particularly, Figure 1 0 illustrates 
a portion of the error bitmap calculated by subtracting 
the Nth approximation bitmap file from the original 20 
bitmap file as executed by the block 46. A block 160 
of Figure 9 establishes, in the preferred embodiment, 
a 3x3 matrix of error bitmap values surrounding each 
pixel and further establishes coefficients for each bit 
in the bitmap matrix. Thus, for example, as seen in 25 
Figure 10, a 3x3 matrix is established surrounding a 
particular error bit 162. The values stored in the ma- 
trix are in turn multiplied by kernel coefficients which, 
in the preferred embodiment, are all equal to 0.4, and 
the resulting multiplied values are summed together 30 
to obtain a kernel value. In the case of an error bit 
162, the kernel value is equal to 0.4, owing to a "1" 
stored as an error bit 164 and zeroes in the remaining 
bits of the 3x3 matrix. 

As a further example, where the kernel value for 35 
a bit 166 is to be calculated, the values in the 3x3 ma- 
trix surrounding such bit are multiplied by the kernel 
coefficients and the resulting values are added to- 
gether to arrive a value of 0.4 + 0.4 + 0.4 + 0.4 = 1 .6. 
A block 1 68 performs the foregoing kernel calculation 40 
and a block 170 compares each kernel value against 
an operator specified limit. If the limit is exceeded the 
certain number of times, then the error is determined 
to be unacceptable by the block 50 and control pass- 
es to the block 54 of Figure 2B. On the other hand, if 45 
less than the certain number of kernel values exceed 
the operator specified limit, then the errors are con- 
sidered to be acceptable and control passes to the 
block 52 of Figure 2B. 

As an alternative to the foregoing operation, the 50 
kernel values for the entire page may be summed and 
compared against an operator specified limit. In this 
case, if the limit is exceeded, the block 50 passes 
control to block 54 for further processing. If, however, 
the total of the kernel values is less than the operator- 55 
specified limit, then the errors are considered accept- 
able by the block 50 and control passes to the block 
52. 



If desired, a different error detection scheme may 
be utilized, as should be evident to one of ordinary 
skill in the art. 

As is evident from the foregoing, the present in- 
vention is useful to convert a scanned page into PDL 
expressions. This is particularly useful to convert old 
printed material into a form for electronic publication 
on CD-ROM or using multi-media. Also, the present 
invention is capable of recognizing all errors in the 
conversion process, in turn potentially allowing error 
free recognition. 

Numerous modifications and alternative embodh 
ments of the invention will be apparent to those skil- 
led in the art in view of the foregoing description. Ac- 
cordingly, this description is to be construed as illus- 
trative only and is for the purpose of teaching those 
skilled in the art the best mode of carrying out the in- 
vention. The details of the structure may be varied 
substantially without departing from the spirit of the 
invention, and the exclusive use of all modifications 
which come within the scope of the appended claims 
is reserved. 



Claims 

1 . A method of converting an original representation 
of a page element expressed in bitmap form into 
a page definition language representation of the 
page element, comprising the steps of: 

(a.) establishing a first set of recognition para- 
meters; 

(b.) using the established set of recognition 
parameters to convert the original represen- 
tation into an element approximation ex- 
pressed in the page definition language; 
(c.) converting the element approximation 
into an approximation bitmap; 
(d.) comparing the approximation bitmap to 
the original representation expressed in bit- 
map form to obtain an error indication; 
(e.) determining whether the error indication 
meets a certain criterion; 
(f.) using the element approximation as the 
page definition language representation if the 
error indication meets the certain criterion; or 
(g.) repeating steps (b.) through (f.) at least 
once if the error indication fails to meet the 
certain criterion, using at least one different 
established set of recognition parameters until 
an element approximation is obtained that re- 
sults in an error indication which meets the 
certain criterion. 

2. A method according to claim 1 , wherein the page 
element comprises a character having an identity 
and expressed in a font and wherein the step (b.) 
includes the step of estimating the identity of the 
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character and the font 

3. A method according to claim 2, wherein the step 
of estimating the identity of the character and the 
font includes the step of comparing the original s 
representation bitmap with stored character bit- 
maps. 

4. A method according to any preceding claim, 
wherein the step (d.) comprises the step of sub- w 
tracting the approximation bitmap from the origi- 
nal representation bitmap to obtain the error in- 
dication. 

5. A method of converting a bitmap representation is 
of a character expressed in a font into a page def- 
inition language expression, comprising the 
steps of: 

(a.) detecting a characteristic of the character; 
(b.) using the detected characteristic to obtain 20 
one estimate of the identity of the character 
and the font; 

(c.) using the detected characteristic to obtain 
another estimate of the identity of the charac- 
ter and t he font to t here by develop successive 25 
estimates; 

(d.) determining whether the successive esti- 
mates are the same; 

(e.) developing a page definition language ex- 
pression of the character and font using at 30 
least one of the estimates if the successive 
estimates are the same; or 
(f.) repeating steps (c.) through (e.) at least 
once if the successive estimates are not the 
same until two successively obtained esti- 35 
mates are the same. 

6. A method according to claim 5, wherein the step 
(a.) includes the step of detecting character met- 
rics. 40 

7. A method according to claim 6, wherein each of 
the steps (b.) and (c.) includes the step of com- 
paring the detected character metrics with stored 
character metrics. as 

8. A method of reproducing a plurality of printed 
characters each printed in a font at a position on 
a page, comprising the steps of: 

(a.) converting the printed characters into a so 
bitmap representation of same; 
(b.) selecting a first character; 
(c.) detecting characteristics of the character; 
(d.) using the detected characteristics to de- 
velop character and font data representing 55 
the identity of the character and the font in 
which the character is expressed; 
(e.) storing the character and font data togeth- 



er with position data representing the position 
of the character on the page; 
(f.) repeating steps (c.) - (e.) for remaining 
characters on the page; 
(g.) converting the stored character and font 
data and the stored position data into page 
definition language expressions and 
(h.) using the page definition language ex- 
pressions to operate a printing device so that 
the printing device produces a printed page. 

9. A method according to claim 8, wherein the step 
(a.) comprises the step of using a scanner (22) to 
convert the printed characters into the bitmap 
representation. 

10. A method according to claim 8 or claim 9, includ- 
ing the further step of dividing the bitmap repre- 
sentation into a plurality of blocks, each of which 
includes a character. 

11. A method according to any of claims 8 to 10, 
wherein a nontext portion is printed at a certain 
position on the page and including the further 
steps of converting the nontext portion into a fur- 
ther page definition language expression and 
merging the further page definition language ex- 
pression with the page definition language ex- 
pressions obtained in step (g.). 

12. A method according to any preceding claim, 
wherein the page definition language is Post- 
Script. 

13. Apparatus capable of commanding a printing de- 
vice (14) to reproduce a page (12) having a plur- 
ality of characters (42) printed thereon, wherein 
each character has an identity and is printed in a 
font and wherein the printed page is represented 
by a bitmap representation, comprising: 

means for detecting metrics of each char- 
acter of the bitmap representation; 

means responsive to the detecting means 
for obtaining an estimate of each character in- 
cluding the identity thereof and the font in which 
such character is printed; 

means responsive to the obtaining means 
for comparing the estimates of the characters 
with the bitmap representation to obtain an error 
indication; 

means responsive to the comparing 
means for successively correcting character es- 
timates until the error indication meets a certain 
criterion; and 

means for assembling printing device 
commands in a page definition language using 
the character estimates. 



BNSDOCID: <EP O 582490 A2_L> 



EP 0 582 490 A2 





ABCDEFGHIJK 
LMNOPQRSTU 
VWXYZ. 123456 
7890 !e#$7o~ 



KEYBOARD 



-12 



18 



► 


OUTPUT 




DEVICE(S) 




i 




14 



FIG.1 



8 



0582490A2 I > 



EP 0 582 490 A2 



© 



FIG.2A 



( START J 



N=1 



-30 



OBTAIN 
BITMAP FILE 



-32 




SET INITI A L 
RECOGNITION 
PARAMETERS 



CONVERT ORIGINAL 
BITMAP FILE TO 
Nth PDL FILE 



I 



'38 



CONVERT 
Nth PDL FILE 

TO Nth 
APPROXIMATION 

BITMAP FILE 



AA 




BNSDOCID: <EP. 



_0582490A2_I_> 



EP 0 582 490 A2 



© 



© 



SUBTRACT Nth 
APPROXIMATION 

BITMAP FILE 
FROM ORIGINAL 

BITMAP FILE 



PERFORM 
ERROR DETECTION 



MODIFY 
RECOGNITION 
PARAMETERS 



FIG.2B 



-46 




52 



STORE OR 
USE Nth 
PDL FILE 



10 



BNSDOCID: <EP 0582490A2_I_> 



EP 0 582 490 A2 



FROM BLOCK 36 OR 34, FIG. 2A 



1 



DETERMINE 
IMAGE 
BLOCKS 



-60 



GENERATE IMAGE 
PAGE DEFINITION 
LANGUAGE 



I 



-6 2 



PERFORM FONT 
AND CHARACTER 
RECOGNITION 



-64 



GENERATE TEXT 
PAGE DEFINITION 
LANGUAGE 



—^66 



MERGE TEXT PDL 
WITH IMAGE 
PDL 



-68 



OUTPUT 
PDL FILE 



70 



TO BLOCK AA, FIG. 2A 



FIG. 3 



11 



BNSDOCID: <EP 0582490 A2_l_> 



EP 0 582 490 A2 



FROM BLOCK 62, FIG. 3 



FI6.4A 



© 



1 

GENE 
CHAR; 
BLO< 


r 

RATE 

XCTER 

CKS 




r 


SEU 
FIR 
CHAR/ 


:ct 

ST 

^CTER 



■72 



1 

F NEW = F 

Cnew=c 

M: 


r 

OLD= 0 

: OLD = 0 
: 1 




« 




ESTIMATE CHAR- 
acter identity 
Cnew=Cest 




r 


ESTIMATE FONT 

IDENTITY 
F NEW = F EST 




12 



0582490 A2 I > 



EP 0 582 490 A2 



FIG.4B 



SAVE Cnew 

Fnew and 
character pos. 



90 




GO TO 

BLOCK66,RG.3 



SELECT 
NEXT 
CHARACTER 



162 164 



166 Fl G.10 



r 

! o 

i 


E 


E 


! . i 




i ; 




1 

i 0 

i 


0 


i ; 


! o 


o 


i • 




i 

! o 
i 


0 


0 j 


! i 


o 


o j 




i 


0 


0 


o 


o 


0 




0 


0 


o 


o 


o 


0 


1 

















13 



EP 0 582 490 A2 



FIG. 5 



FROM BLOCK 76, FIG.4A 



1 



. DETECT 
CHARACTER METRICS 



r 



STORE 
CHARACTER METRICS 


i 


f 


PERFORM METRIC 
COMPARISON 







SELECT CLOSEST 
CHARACTER 



SELECT CLOSEST 
FONT 



-100 



-102 



-104 



-106 



-108 



TO BLOCK 82, FIG A A 



14 



BNSDOCID: <EP 0S82490A2_L> 



EP 0 582 490 A2 



FROM BLOCK 62, FIG. 3 



NORMALIZE PAGE 
ORIENTATION 



I 



j 120 



DETECT PAGE 
METRICS 



i 



r 



22 



SELECT FIRST 
PIXEL 



I 



-124 



DETERM 
BOUNDING B 



I 



BOX I 



132 



REMOVE BLOCK 
FROM BITMAP 



-128 





134 



SEND EACH BLOCK 
TO FONT/CHARACTER 
RECOGNITION (FIG. 5) 



T 



END 



FIG. 6 



15 



EP 0 582 490 A2 



FIG. 7 

FROM BLOCK 60 OR 64.FIG.3 



SELECT 
PAGE DESCRIPTION 
LANGUAGE 



•140 



GENERATE 
PDL 

COMMANDS 



•142 



GO TO BLOCKS 64 0R68,FK3.3 

Fl 6.8 

FROM BLOCK 34, FIG. 2 A 
OR BLOCK 56, FIG 2B 

1 



SELECT AREA OF 
SCANNED PAGE 
TO PROCESS 



I 



FROM BLOCK 46, FIG 2B 

I 



^150 



ESTABLISH i 
LOAD KERNEL 
COEFFICIENTS 



I 



-160 



SPECIFY PAGE 


, 152 


PROCESS 




PARAMETERS 




BITMAP FILE 




V 






r 




SPECIFY 


^ 154 


DETERMINE 




PROCESS 




ERROR 




PARAMETERS 




STATISTICS 





-168 



-170 



TO BLOCK 38,FIG.2A 
OR BLOCK 34,FIG.2A 



T 



TO BLOCK S0.FIG.2B 

• FIG.9 



16 



BNSDOCID: <EP 0582490A2_I_> 



I* 




Europaisches Paten tarn t 
European Patent Office 
Office europeen des brevets 





@ Publication number : 0 582 490 A3 



EUROPEAN PATENT APPLICATION 



y) Application number : 93306265.5 
2) Date of filing : 09.08.93 



© int. ci. 5 : G06K 9/00, G06K9/48 



(So) Priority : 07.08.92 US 926198 

© Date of publication of application : 
09.02.94 Bulletin 94/06 

@ Designated Contracting States : 
DE FR GB 



(§8) Date of deferred publication of search report : 
15.02.95 Bulletin 95/07 

@ Applicant : R.R. DONNELLEY & SONS 
COMPANY 

77 West Wacker Drive 
Chicago, Illinois 60601-1696 (US) 



72) Inventor : Bengtson, Michael 
22913 East Drive 
Richton Park 60471, Illinois (US) 



@ Representative : Marshall, John 
SERJEANTS 
25 The Crescent 
King Street 

Leicester LE1 6RX (GB) 



CO 
< 



@ Converting bitmap data into page definition language commands. 

(57) A method of and apparatus for converting an 
original representation of a page element exp- 
ressed in bitmap form into a page definition 
language representation of the page element 
develops an element approximation expressed 
in the page definition language, converts the 
element approximation into an approximation 
bitmap and compares the approximation bit- 
map to the original representation expressed in 
bitmap form to obtain an error indication. The 
error indication is checked to determine 
whether it meets a certain criterion and, if so, 
the element approximation is used as the page 
definition language representation. Otherwise, 
one or more further element approximations 
are developed until an element approximation 
is obtained that results in an error indication 
which meets the certain criterion. 



9 

CM 
00 

in 



a. 

UJ 



Jouve, 18, rue SainVDonis, 75001 PARIS 



BNSDOCID: <EP 0582490A3_I_> 



EP 0 582 490 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



EP 93 30 6265 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Caatioaof^ 



Relevant 

to I 



CLASSIFICATION OF THE 
APPLICATION (hLCLS) 



DATABASE WPI 5-13 
Week 9221, 

Derwent Publications Ltd., London, GB; 
AN 92-169948 

& JP-A-4 047 756 (RICOH KK) 17 February 
1992 

* abstract * 



G06K9/00 
G06K9/48 



PATENT ABSTRACTS OF JAPAN 

vol . 13, no. 460 (P-946) 18 October 1989 

& JP-A-01 180 083 (NEC CORP) 18 July 1989 

* abstract * 

PATENT ABSTRACTS OF JAPAN 
vol. 16, no. 166 (E-1193) 22 April 1992 
& JP-A-04 013 369 (RICOH CO LTD) 17 
January 1992 

* abstract * 

PATENT ABSTRACTS OF JAPAN 

vol. 10, no. 278 (P-499) 20 September 1986 

& JP-A-61 098 487 (RICOH CO LTD) 16 May 

1986 

* abstract * 

DE-A-134 997 (SIEMENS AKTIENGESELLSCHAFT 
BERLIN UNO MONCHEN) 

* the whole document * 



1-4 
5-13 



1.5,8,1- 



5-13 



5-13 



TECHNICAL FIELDS 
SEARCHED (latCLS) 



G06K 



The 



THE HAGUE 



DAtin^l.liiWthieetk 

20 December 1994 



Suendernann, R 



CATEGORY OF CITED DOCUMENTS 



X : particularly rcJcvmnt H 
y : particularly relevant if 
tof tbcsaae 



T : theory or priadple uneerlytog tbt 
E : mtUm ftmxt <oc— at, bat 

after tba fiUftg data 
D : eaciuMot dtmi in tba applkatioo 

L: * 



A : t 

O : noo-wrfctm disdnstve 
P: 



r of tba same patent family, 



BNSDOCID: <EP. 



.0582490A3_L> 



